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Abstract. A regular tree language L is locally testable if membership of a tree in L 
depends only on the presence or absence of some fix set of neighborhoods in the tree. In 
this paper we show that it is decidable whether a regular tree language is locally testable. 
The decidability is shown for ranked trees and for unranked unordered trees. 



This paper is part of a general program trying to understand the expressive power of 
first-order logic over trees. We say that a class of regular tree languages has a decidable 
characterization if the following problem is decidable: given as input a finite tree automa- 
ton, decide if the recognized language belongs to the class in question. Usually a decision 
algorithm requires a solid understanding of the expressive power of the corresponding class 
and is therefore useful in any context where a precise boundary of this expressive power 
is crucial. In particular we do not possess yet a decidable characterization of the tree lan- 
guages definable in FO(<), the first-order logic using a binary predicate < for the ancestor 
relation. 

We consider here the class of tree languages definable in a fragment of FO(<) known 
as Locally Testable (LT). A language is in LT if membership in the language depends only 
on the presence or absence of neighborhoods of a certain size in the tree. A closely related 
family of languages is the class LTT of Locally Threshold Testable languages. Membership 
in such languages is determined by counting the number of neighborhoods of a certain size 
up to some threshold. The class LT is the special case where no counting is done, the 
threshold is 1. In this paper we provide a decidable characterization of the class LT over 
trees. 

The standard approach for deriving a decidable characterization is to first exhibit a set 
of closure properties that hold exactly for the languages in the class under investigation 
and then show that these closure properties can be automatically tested. This requires 
a formalism for expressing the desired closure properties but also some tools, typically 
induction mechanisms, for proving that the properties do characterize the class, and for 
proving the decidability of those properties. 
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Over words one formalism turned out to be successful for characterizing many classes 
of regular languages. The closure properties are expressed as identities on the syntactic 
monoid or syntactic semigroup of the regular language. The syntactic monoid or syntac- 
tic semigroup of a regular language is the transition monoid of its minimal deterministic 
automaton including or not the transition induced by the empty word. For instance the 
class of word languages definable in FO(<) is characterized by the fact that the syntactic 
monoid of any such languages is aperiodic. The latter property corresponds to the identity 
x uj _ x u+i w h ere w i s th e s i ze f tn e monoid. This equation is easily verifiable automatically 
on the syntactic monoid. Similarly, the classes LTT and LT have been characterized using 
decidable identities on the syntactic semigroup [BS73L lMcN74l IBP891 ITW85] . 

Over trees the situation is more complex and right now there is no formalism that can 
easily express all the known closure properties for the classes for which we have a decidable 
characterization. The most successful formalism is certainly the one introduced in [BW07J 
known as forest algebras. For instance, these forest algebras were used for obtaining de- 
cidable characterizations for the classes of tree languages definable in EF+EX [BW06 , 
EF+F" 1 |Boj07bt [PlaOS] . BC-Si(<) [BSS081 IFIafiS) . A 2 (<) [BSMl lPla08] . However it is 



not clear yet how to use forest algebras in a simple way for characterizing the class LTT 
over trees and a different formalism was used for obtaining a decidable characterization for 
this class [BSTO] . 

We were not able to obtain a reasonable set of identities for LT either by using forest 
algebras or the formalism used for characterizing LTT. Our approach is slightly different. 

There is another technique that was used on words for deciding the class LT. It is based 
on the "delay theorem" [Str85| lTil87] for computing the required size of the neighborhoods: 
Given an automaton recognizing the language L, a number k can be computed from that 
automaton such that if L is in LT then it is in LT by investigating the neighborhoods of 
size k. Once this k is available, deciding whether L is indeed in LT or not is a simple 
exercise. On words, a decision algorithm for LT (and also for LTT) has been obtained 
successfully using this approach |Boj07a . Unfortunately all efforts to prove a similar delay 
theorem on trees have failed so far. 

We obtain a decidable characterization of LT by combining the two approaches men- 
tioned above. We first exhibit a set of necessary conditions for a regular tree language to 
be in LT. Those conditions are expressed using the formalism introduced for characterizing 
LTT. We then show that for languages satisfying such conditions one can compute the re- 
quired size of the neighborhoods. Using this technique we obtain a characterization of LT 
for ranked trees and for unranked unordered trees. 

Other related work. There exist several formalisms that have been used for expressing 
identities corresponding to several classes of languages but not in a decidable way. Among 
them let us mention the notion of preclones introduced in }EW05j as it is close to the one 
we use in this paper for expressing our necessary conditions. 

Finally we mention the class of frontier testable languages, not expressible in FO(<), 
that was given a decidable characterization using a specific formalism Wil96l. 

Organization of the paper. We start with ranked trees and give the necessary notations 
and preliminary results in Section [2j Section [3] exhibits several conditions and proves they 
are necessary for being in LT. In Section [4] we show that for the languages satisfying the 
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necessary conditions the required size of the neighborhoods can be computed, hence con- 
cluding the decidability of the characterization. Finally in Section [5] we show how our result 
extends to unranked trees. 

2. Notations and preliminaries 

We first investigate the case of binary trees. The case of unranked unordered trees will be 
considered in Section [5j 

Trees. We fix a finite alphabet £, and consider finite binary trees with labels in S. All the 
results presented here extend to arbitrary ranks in a straightforward way. In the binary 
case, each node of the tree is either a leaf (has no children) or has exactly two children, 
the left child and the right child. We use the standard terminology for trees. For instance 
by the descendant (resp. ancestor) relation we mean the reflexive transitive closure of the 
child (resp. inverse of child) relation and by distance between two nodes we refer to the 
length of the shortest path between the two nodes. A language is a set of trees. 

Given a tree t and a node x of t the subtree oft rooted at x, consisting of all the nodes of t 
that are descendant of x, is denoted by t\ x . A context is a tree with a designated (unlabeled) 
leaf called its port which acts as a hole. Given contexts C and C , their concatenation C ■ C 
is the context formed by identifying the root of C with the port of C. A tree C ■ t can be 
obtained similarly by combining a context C and a tree t. Given a tree t and two nodes 
x,y of t such that y is a descendant (not necessarily strict) of x, the context oft between x 
and y, denoted by t[x,y], is defined by keeping all the nodes of t that are descendants of x 
but not descendants of y and by placing the port at y. 

We say that a context C occurs in t if C is the context of t between x and y for some 
nodes x and y of t. 

Types. Let t be a tree and x be a node of t and k be a positive integer, the k-type of x is 
the (isomorphism type of the) restriction of t\ x to the set of nodes of t at distance at most 
k from x. When k will be clear from the context we will simply say type. A k-type r occurs 
in a tree t if there exists a node of t of type r. If C is the context t[x, y] for some tree t and 
some nodes x, y of t, then the fc-type of a node of C is the fe-type of the corresponding node 
in t. Notice that the k-type of a node of C depends on the surrounding tree t, in particular 
the port of C has a k-type, the one of y in t. 

Given two trees t and t' we denote by t =4k t' the fact that all fc-types that occur in t 
also occur in t'. Similarly we can speak of t C when t is a tree and C is t'[x, y] for some 
tree t 1 and some nodes x, y of t' . We denote by t ~ fc t' the property that the root of t and 
the root of t' have the same k-type and t and t' agree on their fc-types: t =^ t' and t' =4k t . 
Note that when k is fixed the number of fe-types is finite and hence the equivalence relation 
has a finite number of equivalence classes. This property is no longer true for unranked 
trees and this is why we will have to use a different technique for this case. 

A language L is said to be k- locally testable if L is a union of equivalence classes of ~ K . 
A language is said to be locally testable (is in LT) if there is a k such that it is K-locally 
testable. In other words, in order to test whether a tree t belongs to L it is enough to check 
for the presence or absence of K-types in t, for some big enough k. 
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Regular Languages. We assume familiarity with tree automata and regular tree lan- 
guages. The interested reader is referred to |CGJ + 07 for more details. Their precise def- 



initions are not important in order to understand our characterization. However pumping 
arguments will be used in the decision algorithms. 

The problem. We want an algorithm deciding if a given regular language is in LT. When 
the complexity is not an issue, we can assume that the language L is given as a MSO 
formula. Another option would be to start with a bottom-up tree automaton for L or, 
even better, the minimal deterministic bottom-up tree automaton that recognize L. We 
will come back to the complexity issues in Section [71 The main difficulty is to compute a 
bound on k, the size of the neighborhood, whenever such a k exists. 

The word case is a special case of the tree case as it corresponds to trees of rank 1 . A de- 
cision procedure for LT was obtained in the word case independently by [BS73] and [M cN74] . 
A language L is in LT if and only if its syntactic semigroup satisifies the equations exe = 
exexe and exeye = eyexe, where e is an arbitrary idempotent (ee = e) while x and y are 
arbitrary elements of the semigroup. The equations are then easily verified after computing 
the syntactic semigroup. 

In the case of trees, we were not able to obtain a reasonably simple set of identities for 
characterizing LT. Nevertheless we can show: 

Theorem 2.1. It is decidable whether a regular tree language is in LT. 

Our strategy for proving Theorem 12. II is as follows. In a first step we provide necessary 
conditions for a language to be in LT. In a second step we show that if a language L verifies 
those necessary conditions then we can compute from an automaton recognizing L a number 
k such that if L is in LT then L is K-locally testable. The last step is simple and show that 
once k is fixed, it is decidable whether a regular language is K-locally testable. This last step 
follows immediately from the fact that once k is fixed, there are only finitely many K-locally 
testable languages and hence one can enumerate them and test whether L is equivalent to 
one of them or not. 

Given a regular language L, testing whether L is in LT is then done as follows: (1) 
compute from L the k of the second step and (2) test whether L is K-locally testable using 
the third step. The first step implies that this algorithm is correct. 

Before starting providing the proof details we note that there exist examples showing 
that the necessary conditions are not sufficient. Such an example will be provided in Sec- 
tion [6J We also note that the problem of finding k whenever such a n exists is a special case 
of the delay theorem mentioned in the introduction. When applied to LT, the delay theo- 
rem says that if a finite state automaton A recognizes a language in LT then this language 
must be K-locally testable for a k computable from A. The delay theorem was proved over 
words in [Str85] and can be used in order to decide whether a regular language is in LT as 
explained in |Boj07a|. We were not able to prove such a general theorem for trees. 



3. Necessary conditions 

In this section we exhibit necessary conditions for a regular language to be in LT. These 
conditions will play a crucial role in our decision algorithm. These conditions are expressed 
using the same formalism as the one used in |BS10| for characterizing LTT. 
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Guarded operations. Let t be a tree, and x, x' be two nodes of t such that x and x' are 
not related by the descendant relationship. The horizontal swap of t at nodes x and x' is 
the tree t' constructed from t by replacing t\ x with t\ x > and vice-versa, see Figure [Q (left). 
A horizontal swap is said to be k-guarded if x and x' have the same /c-type. 

Let t be a tree and x, y, z be three nodes of t such that x, y, z are not related by the 
descendant relationship and such that t\ x = t\ y . The horizontal transfer of t at x,y,z is 
the tree t' constructed from t by replacing t\ y with a copy of t\ z , see Figure Q] (right). A 
horizontal transfer is /c-guarded if x, y, z have the same /c-type. 




Figure 1: Horizontal Swap (left) and Horizontal Transfer (right) 



Let t be a tree of root a, and x, y, z be three nodes of t such that y is a descendant of 
x and z is a descendant of y. The vertical swap of t at x,y,z is the tree t! constructed from 
t by swapping the context between x and y with the context between y and z, see Figure [2] 
(left). More formally let C = t[a, x], Ai = t[x,y], A2 = and T = t\ z . We then have 

t = C ■ A\ ■ A2 ■ T. The tree t' is defined as t' = C ■ A2 • Ai • T. A vertical swap is k-guarded 
if x, y, z have the same /c-type. 

Let t be a tree of root a, and x,y,z be three nodes of t such that y is a descendant of x 
and z is a descendant of y such that A = t[x,y] = t[y,z]. The vertical stutter of t at x,y,z 
is the tree i' constructed from t by removing the context between x and y, see Figure [2] 
(right). A vertical stutter is /c-guarded if x,y,z have the same /c-type. 




Figure 2: Vertical Swap (left) and Vertical Stutter (right) 

Let L be a tree language and A; be a number. If X is any of the four constructions 
above, horizontal or vertical swap, or vertical stutter or horizontal transfer, we say that L 
is closed under k-guarded X if for every tree t and every tree t! constructed from t using 
/c-guarded X then t is in L iff t! is in L. Notice that being closed under fc-guarded X implies 
being closed under fc'-guarded X for k' > k. An important observation is that each of the 
/c-guarded operation does not affect the set of {k + l)-types occurring in the trees. 

If L is closed under all the /c-guarded operations described above, we say that L is 
k-tame. A language is said to be tame if it is /c-tame for some k. 

The following simple result shows that tameness is a necessary condition for LT. 



6 



T. PLACE AND L. SEGOUFIN 



Proposition 3.1. If L is in LT then L is tame. 

Proof. Assume L is in LT. Then there is a k such that L is K-locally testable. We show- 
that L is K-tame. This is a straightforward consequence of the fact that all the K-guarded 
operations above preserve (k + l)-types and hence preserve K-types. □ 

A simple pumping argument shows that if L is tame then it is k-t&me for k bounded by 
a polynomial in the size of the minimal deterministic bottom-up tree automaton recognizing 
L. 

Proposition 3.2. Given a regular language L and A the minimal deterministic bottom-up 
tree automaton recognizing L, we have L is tame iff L is k^-tame for ko = \A\ 3 + 1. 

Proof. We prove that if X is one of the four operations that defines tameness, then if L is 
closed under /c-guarded X for k > ko, then L is closed under /co-guarded X. This will imply 
that if L is /c-tame then it is /co-tame. 

Consider the case of /c-guarded horizontal transfer and assume L is closed under k- 
guarded horizontal transfers. We show that L is closed under /co-guarded horizontal trans- 
fers. Let t be a tree and x, y, z three nodes of t having the same fco-type and not related by 
the descendant relation such that t\ x = t\ y . We need to show that replacing t\ y by a copy 
of t\ z does not affect membership in L. 

We do this in three steps, first we transform t by pumping in parallel in the subtrees of 
x, y and z until x, y, z have the same fc-type, then we use the closure of L under /c-guarded 
horizontal transfer in order to replace t\ y by a copy of t\ z , and finally we backtrack the 
initial pumping phase in order to recover the initial subtrees. 

We let t\ = t\ x and t<i = t\ z and we assume for now on that t\ ^ t<i- By position we 
denote a string w of {0, 1}*. A position w is realized in a tree t if there is a node x of t such 
that if xi, ■ ■ ■ , x n = x is the sequence of nodes in the path from the root of t to x then for 
all i < n the i th bit of w is zero if Xi is a left child and it is one if xi is a right child. We 
order positions by first comparing their respective length and then using the lexicographical 
order. 

By hypothesis t\ and *2 are identical up to depth at least ko- Let w be the first position 
such that ii and t<i differ at that position. That can be either because w is realized in t± 
but not in t2, or vice versa, or w is realized in both trees but the labels of the corresponding 
nodes differ. We know that the length n of w is strictly greater than k$. If n > k, we are 
done with the first phase. We assume now that n < k. 

Consider the run r of A on t. The run assigns a state q to each node of t. Prom r we 
assign to each position w' < w a pair of states (q, q') such that q is the state given by r at 
the corresponding node in t\ while q' is the state given by r at the corresponding node in 
t2- Because n > ko > \A\ 2 , there must be two prefixes w\ and W2 of w that were assigned 
the same pair of states. Consider the context C\ = ti[t>i,t>2] where v and v' are the nodes 
of ii at position w\ and W2 and the context C2 = t2[v' 1 ,v 2 ] where v and v' are the nodes 
of t2 at position w\ and W2. Without affecting membership in L, we can therefore at the 
same time duplicate C\ in the two copies of t\ rooted at x and y and C2 in the copy of t2 
rooted at z. 

Let t' x and t' 2 be the subtrees of the resulting tree, rooted respectively at x and z. The 
reader can verify that t^ and t' 2 now differ at a position strictly greater than w. 

Performing this repeatedly, we eventually arrive at a situation where the subtree t[ 
rooted at x and y agree up to depth k with the subtree rooted at z. We can now apply 
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fc-guarded horizontal transfer and replace one occurrence of tj by a copy of t^. We can then 
replace t^ by t\ and both copies of t' 2 by t% without affecting membership in L. 

The other operations are done similarly. For the horizontal swap, we pump the subtrees 
at positions x and x' simultaneously, which is possible because ko > \A\ 2 . For vertical swap, 
we pump the subtrees at the positions x, y and z simultaneously, and that requires ko > \A\ 3 . 
Finally, for vertical stutter, we pump the subtrees at the positions x, y and z simultaneously, 
which again requires ko > |A[ 3 . □ 

Once k is fixed, a brute force algorithm can check whether L is £>tame or not. Indeed, 
as L is regular, when testing for closure under k- guarded X, it is enough to consider all 
relevant states and appropriate transition functions of the automata instead of all trees and 
all contexts. See for instance Lemma 12 and Lemma 13 in [ BS10| . 

Therefore Proposition 13.21 implies that tameness is decidable. However for deciding LT 
we will only need the bound on ko given by the proposition. 



In this section we show that it is decidable whether a regular tree language is in LT. This is 
done by showing that if a regular language L is in LT then there is a k computable from an 
automaton recognizing L such that L is in fact K-locally testable. Recall that once this k is 
computed the decision procedure simply enumerates all the finitely many K-locally testable 
languages and tests whether L is one of them. 

Assume L is in LT. By Proposition 13. 1| L is tame. Even more, from Proposition 13. 2| 
one can effectively compute a k such that L is fc-tame. Hence Theorem 12. II follows from the 
following proposition. 

Proposition 4.1. Assume L is a k-tame regular tree language then L is in LT iff L is 
K-locally testable where k is computable from k. 

Recall that for each k the number of fc-types is finite. Let /?& be this number. Proposi- 
tions?!] is an immediate consequence of the following proposition. 

Proposition 4.2. Let L be a k-tame regular tree language. Set k = fit + k + 1 . Then for 
all I > k and any two trees t, t' if t ~ K t! then there exist two trees T, T' with 



in Proposition 14.21 We show that L is in LT iff L is K-locally testable. Assume L is in LT. 
Then L is /-locally testable for some I G N. We show that L is actually K-locally testable. 
For this it suffices to show that for any pair of trees t and t', if t ~ re t' then t G L iff t' € L. 
Let T and T' be the trees constructed for I from t and t' by Proposition 14.21 We have 
T ~ z T' and therefore T G L iff V G L. As we also have t G L iff T G L and t' G L iff 
T' G L, the proposition is proved. □ 



4. Deciding LT 



(1) t G L iff T G L 

(2) t' G L iff v a 

(3) T ~, V 




Assume L is A;-tame and let k be defined as 
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Before proving Proposition 14.21 we need some extra terminology. A non-empty context 
C occurring in a tree t is a loop of k -type r if the /c-type of its root and the /c-type of its port 
is r. A non-empty context C occurring in a tree t is a /c-loop if there is some /c-type r such 
that C is a loop of /c-type r. Given a context C we call the path from the root of C to its 
port the principal path of C. Finally, the result of the insertion of a /c-loop C at a node x 
of a tree t is a tree T such that if i = D ■ t\ x then T = D ■ C -t\ x . Typically an insertion will 
occur only when the /c-type of x is r and C is a loop of /c-type r. In this case the fc-types 
of the nodes initially from t and of the nodes of C are unchanged by this operation. 

Proof of Proposition \4-S\ Suppose that L is /c-tame. We start by proving two lemmas that 
will be useful in the construction of T and T'. Essentially these lemmas show that even 
though being /c-tame does not imply being (/c + l)-locally testable (recall the remark after 
Theorem 12. ip some of the expected behavior of (k + l)-locally testable languages can still 
be derived from being /c-tame. The first lemma shows that given a tree t, without affecting 
membership in L, we can replace a subtree of t containing only (k + l)-types occurring 
elsewhere in t by any other subtree satisfying this property and having the same /c-type as 
root. The second lemma shows the same result for contexts by showing that a /c-loop can 
be inserted in a tree t without affecting membership in L as soon as all the (k + l)-types 
of the /c-loop were already present in t. After proving these lemmas we will see how to 
combine them for constructing T and X". 

Lemma 4.3. Assume L is k-tame. Let t = Ds be a tree where s is a subtree oft. Let s' be 
another tree such that the roots of s and s' have the same k-type. 
If s =4k+i D and s' =4k+l D then Ds € L iff Ds 1 £ L. 

Proof. We start by proving a special case of the Lemma when s' is actually another subtree 
of t. We will use repeatedly this particular case in the proof. 

Claim 4.4. Assume L is k-tame. Let t be a tree and let x, y be two nodes of t not related 
by the descendant relationship and with the same k-type. We write s = t\ x , s' = t\ y and C 
the context such that t = Cs. If s =4k+i C then Cs € L iff Cs' £ L. 

Proof. The proof is done by induction on the depth of s and makes crucial use of /c-guarded 
horizontal transfer. 

Assume first that s is of depth less than k. Since x and y have the same /c-type, we 
have s = s' and the result follows. 

Assume now that s is of depth greater than k. 

Let t be the (k + l)-type of x. We assume that s is a tree of the form a(s\, S2). Notice 
that the /c-type of the roots of si and S2 are completely determined by r. Since s =4k+i C, 
there exists a node z in C of type r. We write s" = t\ z . 

We consider several cases depending on the relationship between x, y and z. We first 
consider the case where x and z are not related by the descendant relationship, then we 
reduce the other cases to this case. 

Assume that x and z are not related by the descendant relationship. Since s" is of 
type r, it is of the form a(s±, s'2) where the roots of s'{ and s' 2 ' have the same /c-type as 
respectively the roots of s\ and S2- By hypothesis all the (k + l)-types of s\ and S2 already 
appear in C and hence we can apply the induction hypothesis to replace si by s" and S2 by 
s 2 ' without affecting membership in L. Notice that the resulting tree is Cs", that t = Cs € L 
iff Cs" £ L, and that Cs" contains two copies of the subtree s", one at position x and one 
at position z. We now show that we can derive Cs' from Cs" using /c-guarded operations. 
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Since L is fc-tame it will follow that that Cs" € L iff Cs' € L and thus Cs € L iff Cs' € L. 
Let t" = Cs" and we distinguish between three cases depending on the relationship between 
z and y in t": 

(1) If z is a descendant of y, let D = t"[y, z] and notice that s' = Ds". Since x, y and z 
have the same fc-type, we use /c-guarded vertical stutter to duplicate D and a /c-guarded 
horizontal swap to move the new copy of D at position x (see the picture below). The 
resulting tree is Cs' as desired. 




(2) If z is an ancestor of y, let D = t"[z, y] and notice that s" = Ds'. Since y and x have the 
same k-type, we use fc-guarded horizontal swap followed by a /c-guarded vertical stutter 
to delete the copy of D (see the picture below). The resulting tree is Cs' as desired. 




(3) If z and y are not related by the descendant relation, then x, y and z have the same 
fc-type and t"\ x = t"\ z . We use /c-guarded horizontal transfer to replace t"\ x with t"\ y 
as depicted below. 




This concludes the case where x and z are not related by the descendant relationship 
in t. We are left with the case where x is a descendant of z (recall that z is outside s and 
therefore not a descendant of x). We reduce this problem to the previous case by considering 
two subcases: 

• If y, z are not related by the descendant relationship, we use a /c-guarded horizontal swap 
to replace s by s' and vice versa. This reverses the roles of x and y and as y and z are 
not related by the descendant relationship and position y now has (k + l)-type r we can 
apply the previous case. 
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• If z is an ancestor of both x and y we use /c-guarded vertical stutter to duplicate the 
context between z and x. This introduces a new node z' of type r that is not related to 
y by the descendant relationship and we are back in the previous case. 




□ 

We now turn to the proof of Lemma 14.31 The proof is done by induction on the depth 
of s'. The idea is to replace s with s' node by node. 

Assume first that s' is of depth less than k. Then because the fc-type of the roots of s 
and s' are equal, we have s = s' and the result follows. 

Assume now that s' is of depth greater than k. 

Let x be the node of t corresponding to the root of s. Let r be the (k + l)-type of 
the root of s'. We assume that s' is a tree of the form a(s' 1; s 2 ). Notice that the /c-type of 
the roots of s'i and s' 2 are completely determined by r. By hypothesis s' =4k+i D, hence 
there exists a node y in D of type r. We consider two cases depending on the relationship 
between x and y. 

• If y is an ancestor of x, let E be t[y, x] and notice that x and y have the same /c-type. 
This case is depicted below. Hence applying a /c-guarded vertical stutter we can duplicate 
E obtaining the tree DEs. Because L is /c-tame, DEs G L iff t = Ds € L. Now the 
root of Es in DEs is of type r and therefore of the form a(s\, S2) where the roots of s\ 
and S2 have the same /c-type as respectively the roots of and s' 2 . By construction all 
the (k + l)-types of s\ and S2 already appear in D and hence we can apply the induction 
hypothesis to replace s\ by s\ and S2 by s' 2 without affecting membership in L. Altogether 
this gives the desired result. 




• Assume now that x and y are not related by the descendant relationship. This case is 
depicted below. Let s" be the subtree of Ds rooted at y. By hypothesis all the (k + 1)- 
types of s are already present in D and the roots of s and s" have the same /c-type. 
Hence we can apply Claim 14.41 and we have Ds £ L iff Ds" € L. Now the root of s" 
is by construction of type r. Hence s" is of the form a(si,s 2 ) where s\ and S2 have all 
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their (k + l)-types appearing in D and their roots have the same fc-type as respectively 
s[ and s' 2 . Hence by induction si can be replaced by s[ and S2 by s' 2 without affecting 
membership in L. Altogether this gives the desired result. 



We now prove a similar result for fc-loops. 

Lemma 4.5. Assume L is k-tame. Let t be a tree and x a node of t of k-type r. Let t' 
be another tree such that t —k+i f and C be a k-loop of type r in t' . Consider the tree T 
constructed from t by inserting a copy of C at x. Then t € L iffT£L. 

Proof. The proof is done in two steps. First we use the fc-tame property of L to show that 
we can insert a /c-loop C at x in t such that the principal path of C is the same as the 
principal path of C . By this we mean that there is a bijection from the principal path 
of C to the principal path of C that preserves the child relation and (k + l)-types. In a 
second step we replace one by one the subtrees hanging from the principal path of C with 
the corresponding subtrees in C. 

First some terminology. Given two nodes y, y' of some tree T, we say that y' is a 
1-ancestor of y if y is a descendant of the left child of y'. Similarly we define r-ancestorship. 

Consider the context C occurring in t'. Let tjq, ■ ■ ■ ,y n be the nodes of t' on the principal 
path of C and To, • • • , r n be their respective (k + l)-type. For < i < n, set a to 1 if yi + \ 
is a left child of yi and r otherwise. 

From t we construct using A;-guarded swaps and £>vertical stutters a tree t\ such that 
there is a sequence of nodes xq, ■ ■ ■ , x n in t\ with for all < i < n, xi is of type 73 and x% 
is an Cj-ancestor of Xi+\. The tree t\ is constructed by induction on n (note that this step 
do not require that C is a /c-loop). If n = then this is a consequence of t ^k+i t' that 
one can find in t a node of type To- Consider now the case n > 0. By induction we have 
constructed from t a tree t[ such that Xo, ■ ■ ■ ,x n -\ is an appropriate sequence in t^. By 
symmetry it is enough to consider the case where y n is the left child of y n -\. Because all 
/c-guarded operations preserve (k + l)-types, we have t ^k+i t'i and hence there is a node 
x 1 of t'i of type r n . If Xfi — \ IS 3j l-ancestor of x' then we are done. Otherwise consider the 
left child x" of x n —\ and notice that because y n is a child of y n -i and x n -\ has the same 
(k + l)-type as y n -i then x", y n and x' have the same fe-type. 

We know that x 1 is not a descendant of x". There are two cases. If x' and x" are not 
related by the descendant relationship then by A;-guarded swaps we can replace the subtree 
rooted in x" by the subtree rooted in x' and we are done. If x' is an ancestor of x" then the 
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Figure 3: The construction of t2, eliminating the context D between and x^ 

context between x' and x" is a fc-loop and we can use fc-guarded vertical stutter to duplicate 
it. This places a node having the same (fc + l)-type as x' as the left child of x n -i and we 
are done. 

This concludes the construction of t%. From t\ we construct using fc-guarded swaps and 
fc-guarded vertical stutter a tree £2 such that there is a path xq , • • • ,x n in £2 with Xi is of 
type Tj for all < i < n. 

Consider the sequence xq, ■ ■ ■ ,x n obtained in t\ from the previous step. Recall that 
the fc-type of xq is the same as the fc-type of x n . Hence using fc-guarded vertical stutter we 
can duplicate in t% the context rooted in xo and whose port is x n . Let t[ the resulting tree. 
We thus have two copies of the sequence xq, ■ ■ ■ ,x n that we denote by the top copy and the 
bottom copy. Assume Xi is not a child of Xi—\. By symmetry it is enough to consider the 
case where is a 1-ancestor of X{. Notice then that the context between the left child of 
and Xi is a /c-loop. Using fc-guarded vertical swap (see Figure [3]) we can move the top 
copy of this context next to its bottom copy. Using A;-guarded vertical stutter this extra 
copy can be removed. We are left with an instance of the initial sequence in the bottom 
copy, while in the top one X\ is a child of Xi-\. This construction is depicted in figure [3l 

Repeating this argument yields the desired tree ti. 

Consider now the context C f — £2 [^0 ? %n 

]. It is a loop of A;-type r. Let T' be the tree 

constructed from t by inserting C at x. 
Claim 4.6. T'eLifftGL. 

Proof. Consider the sequence of A;-guarded swaps and fe-guarded vertical stutter that was 
used in order to obtain ti from t. Because L is fc-tame, t G L iff £2 G L. 

We can easily identify the nodes of t with the nodes of T' outside of C . Consider the 
same sequence of k- guarded operations applied to T' . Observe that this yields a tree T2 
corresponding to £2 with possibly several extra copies of C As C is a /c-loop, each of the 
roots and the ports of these extra copies have the same fc-type. Hence, using appropriate 
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Vertical Case 



same same 




Horizontal Case 



Figure 4: Bringing copies of the /c-loop C together in Claim 14.61 




k- guarded 
operations 





k- guarded 
operations 



deletion 
of extra 
copies of C 




Figure 5: Relation with t<i 



vertical fc-swaps or appropriate horizontal /c-swaps, depending on whether two copies are 
related or not by the descendant relation, they can be brought together. Two examples of 
such operation is given in Figure 01 

Then, using /c-guarded vertical stutter all but one copy can be eliminated resulting in 
t%. Hence T' G L iff ti € L and the claim is proved. See figure [5j □ 
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It remains to show that T' G L iff T G L. By construction of T' we have C =4k+i t. 
Consider now a node Xi in the principal path of C. Let Tj be the subtree branching out the 
principal path of C at yi and T/ be the subtree branching out the principal path of C at 
Xi. By construction X{ and yi are of (A; + l)-type n. Therefore the roots of Tj and T[ have 
the same fc-type. Because C =4k+l t an the (k + l)-types of T[ already appear in the part 
of T' outside of C. By hypothesis we also have Tj =4k+l t- Hence we can apply Lemma 14.31 
and replacing T[ with Tj does not affect membership in L. A repeated use of that lemma 
eventually shows that T' G L iff T G L. □ 

We return to the proof of Proposition 14.21 Recall that we have two trees t, t' such that 
t ~ K t' for k = ft/- + k + 1. For I > k, we want to construct T,T' such that: 

(1) t € L iff T el 

(2) t' & L iST' E L 

(3) T ~, T' 

Recall that the number of /c- types is Therefore, by choice of k, in every branch of a 
ft-type one can find at least one fc-type that is repeated. This provides many fc-loops that 
can be used using Lemma 14.51 for obtaining bigger types. 

Take I > k, we build T and T' from t and t' by inserting fc-loops in t and t' without 
affecting their membership in L using Lemma 14.51 

Let B = {to, r n } be the set of k- types r such that there is a loop of fc-type r in t or 
in t' . For each r £ fi we fix a context C T as follows. Because r G B there is a context C in 
i or t' that is a loop of fc-type r. For each r G -B, we fix arbitrarily such a C and set C T as 
C • . . . • C, I concatenations of the context C. Notice that the path from the root of C T to 



its port is then bigger than /. 

We now describe the construction of T from t. The construction of T' from if is done 
similarly. The tree T is constructed by simultaneously inserting, for all r G B, a copy of 
the context C T at all nodes of t of type r. 

We now show that T and T' have the desired properties. 

The first and second properties, t G L iff T G L and if G L iff T' G L, essentially 
follow from Lemma 14.51 We only show that t G L iff T G L, the second property is proved 
symmetrically. We view T as if it was constructed from t using a sequence of insertions 
of some context C T for t G B. We write sq, ...,s m the sequence of intermediate trees with 
so = t and s m = T. We call Q the context inserted to get Sj+i from Sj. We show by 
induction on i that (i) Sj ~ k+i t and (ii) s, G L iff Sj+i G L. This will imply £ G L iff T G L 
as desired, (i) is clear for i = 0. We show that for all i (i) implies (ii). Recall that Cj is 
the concatenation of / copies of a /c-loop present either in f or in if . We suppose without 
generality that the /c-loop is present in t. Let s be the tree constructed from t by duplicating 
the /c-loop / times. Hence s is a tree containing C, and by construction s t. Because 

t ~ K with k > k + 1 and Sj — fc+i i we have s ~fc+i Sj. By Lemma 14.51 this implies that 
Si + i G L iff si G -L. By construction we also have Sj+i Sj and the induction step is 

proved. 

We now show the third property: 
Lemma 4.7. T ~, T" 

Proof. We need to show that T =4i T' , T' =4i T and that the roots of T and T' have the 
same /-type. It will be convenient for proving this to view the nodes of T as the union of 
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the nodes of t plus some nodes coming from the /c-loops that were inserted. To do this more 
formally, if x is a node of t of k-type not in B, then x is identified with the corresponding 
node of T. If x is a node of t whose £;-type is in B then x is identified in T with the port 
of the copy of C T that was inserted at node x. We start with the following claim. 

Claim 4.8. Take two nodes x in t and x' in t! , such that x and x' have the same K-type. 
Let y and y' be the corresponding nodes in T and T' . Then y and y' have the same l-type. 

Proof. Let v the K-type of x and x' . Consider a branch of v of length k. By the choice of 
k we know that in this branch one can find two nodes z and z' with the same £>types r, 
with z an ancestor of z' and such that the k-type r of z is determined by v (z is at distance 
> k from the leaves of v). Hence r is in B. Note that because the k-type of z is included 
in v, the presence of a node of type v induces the presence of a node of type r at the same 
relative position than z. Hence a copy of C T is inserted simultaneously at the same position 
relative to y and y' during the construction of T and T' . Because this is true for all branches 
of v and because all C T have depth at least /, then y and y' have the same /-type. □ 

From claim H~8l it follows that the roots of T and T' have the same /-type. By symmetry 
we only need to show that T =4i T' . Let y be a node of T and \x be its /-type. We show 
that there exists y' € T' with type \i. We consider two cases: 

• y is not a node of a loop inserted during the construction of T. Let x be the corresponding 
position in t and let v be its K-type. Since t ~ K t', there is a node x' of t! of type v. Let 
y' be the node of T' corresponding to y' . By Claim FOl v and y' have the same /-type. 

• y is a node inside a copy of C T inserted to construct T. Let x be the node of t where this 
loop was inserted. Let v be the K-type of x (the k-type of x is r). Since i ~ K t', there is 
a node x' of t' of type v. Since k > k, x and x' have the same /c-type, a copy of C T was 
also inserted in t' at position x' during the construction of T' . From Claim I4TH1 x and x', 
when viewed as nodes of T and T' have the same /-type. Let y' be the node of T" in the 
copy of C T inserted at x' that corresponds to the position y. Since y and y' are ancestors 
of x and x' that have the same /-type, and since the context from y to x is the same as 
the context from y' to x' , then y and y' must have the same /-type. □ 

This concludes the proof of Proposition I4.2L □ 

5. UNRANKED TREES 

In this section we consider unranked unordered trees with labels in E. In such trees, each 
node may have an arbitrary number of children but no order is assumed on these children. 
In particular even if a node has only two children we can not necessarily distinguish the left 
child from the right child. 

Our goal is to adapt the result of the previous section and provide a decidable charac- 
terization of locally testable languages of unranked unordered trees. 

In this section by regular language we mean definable in the logic MSO using only the 
child predicate and unary predicates for the labels of the nodes. There is also an equivalent 
automata model that we briefly describe next. A tree automaton A over unordered unranked 
trees consists essentially of a finite set of states Q = {q±, ■ ■ ■ , qk}, an integer m denoted as the 
counter threshold in the sequel, and a transition function 5 associating a unique state to any 
pair consisting of a label and a tuple (91,71) • • • {qk,lk) where 7, G {= i \ i < m} U {> m}. 
The meaning is straightforward via bottom-up evaluation: A node of label a get assigned 
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a state q if for all i, the number of its children, up to threshold m, that were assigned state 
qi is as specified by 5. In the sequel we assume without loss of generality that all our tree 
automata are deterministic. 

In the unranked tree case, there are several natural definitions of LT. Recall the def- 
inition of /c-type: the /c-type of a node x is the isomorphism type of the subtree induced 
by the descendants of x at distance at most k from x. With unranked trees this definition 
generates infinitely many /c-types. We therefore introduce a more flexible notion of type, 
(k, l)-type, based on one extra parameter I restricting the horizontal information. It is de- 
fined by induction on k. Consider an unordered tree t and a node x of t. For k = 0, the 
(k, l)-type of x is just the label of x. For k > the (k, /)-type of x is the label of x together 
with, for each (k — l,/)-type, the number, up to threshold of children of x of this type. 
The reader can verify that over binary trees, the (k, 2)-type and the /c-type of x always 
coincide. As in the previous section we say that two trees are (k, Z) -equivalent, and denote 
this using — (fc,n, if they have the same occurrences of (fc,/)-types and their roots have the 
same (/c, Z)-type. We also use t ^4(k,i) t' to denote the fact that all (k, /)-types of t also occur 
in t'. 

Based on this new notion of type, we define two notions of locally testable languages. 
The most expressive one, denoted ALT (A for Aperiodic), is defined as follows. A language 
L is in (k, A)-ALT if it is a union of (k, A)-equivalence classes. A language L is in ALT if 
there is a k and a A such that L is in (k, A)-ALT. 

The second one, denoted ILT in the sequel (I for Idempotent) , assumes A = 1: A 
language L is in ILT if there is a k such that L is a union of (k, 1) -equivalence classes. 

The main result of this section is that we can decide membership for both ILT and 
ALT. 

Theorem 5.1. It is decidable whether a regular unranked unordered tree language is ILT. 
It is decidable whether a regular unranked unordered tree language is ALT. 

Tameness. The notion of /c-tame is defined as in Section [3] using the same /c-guarded op- 
erations requiring that the swapped nodes have identical /c-type. We also define a notion of 
(k, l)-t&me which corresponds to our new notion of (/c, l)-type. Consider the four operations 
of tameness defined in Section [3j A horizontal swap is said to be (k, /)-guarded if x and 
x' have the same (k, /)-type, a horizontal transfer is (k, /)-guarded if x, y, z have the same 
{k, Z)-type, a vertical swap is (k, /)-guarded if x, y, z have the same (k, Z)-type and a vertical 
stutter is (k, /)-guarded if x, y, z have the same (k, l)-type. Let L be a regular unranked 
unordered tree language and let m be the counting threshold of the minimal automaton 
recognizing L, we say that L is (k, Z)-tame iff it is closed under (A;, /)-guarded horizontal 
swap, horizontal transfer, vertical swap and vertical stutter and / > m (we assume / > m 
in order to make the statements of the results similar to those used in the binary setting). 
We first prove that over unordered trees being /j-tame is the same as being (k, /)-tame. 

Proposition 5.2. Let L be an unordered unranked regular tree language, then for all inte- 
gers k, L is k-tame iff there exists I such that L is (k,l)-tame. Furthermore, such an I can 
be computed from any automaton recognizing L. 

Proof. If there exists I such that L is (k, £)-tame then L is obviously /c-tame. Suppose that 
L is /c-tame, and let m be the counting threshold of the minimal automaton A recognizing 
L. We show that there exists /' such that L is closed under (k, /')-guarded operations. This 
implies the result as one can then take / = max(m + 1,1'). 
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We need to show that L is closed under (k, i')-guarded vertical swap, vertical stutter, 
horizontal swap and horizontal transfer. The proof is similar to the proof of Proposition 1 
in |BS10| . We will use the following claim which is proved in [BS10J using a simple pumping 
argument: 

Claim 5.3. [BS10J For every tree automaton A there is a number V , computable from A, 
such that for every k if a tree t\ is (k, I') -equivalent to a tree t2, then there are trees t'^t^ 
with ti and t' 2 k-equivalent such that A reaches the same state on t\ as on ti for i = 1,2. 

We use this claim to prove that L is closed under horizontal transfer. Let /' be the 
number computed from A by Claim 15.31 We prove that L is closed under (k, Z')-guarded 
horizontal transfer. Consider a tree t and three nodes x,y,z of t not related by the descen- 
dant relationship and such that t\ x = t\ y and such that x, y and z have the same (k, /')-type. 
Let t! be the horizontal transfer of t at x, y, z. Let t\ = t\ x and t2 = t\ z and t^, t' 2 obtained 
from t\,t2 using Claim [5TB"1 Let s be the tree obtained from t by replacing t\ x and t\ y with 
t'i and t\ z with t 2 , and let s' be the tree obtained from t' by replacing t'\ x with t^ and t'\ y 
and t'\ z with t<i- From Claim loTBI it follows that t £ L iff s E L and t' € L iff s' € L. Since 
L is fc-tame, it is closed under fc-guarded horizontal transfer, therefore we have s G L iff 
s' € L, it follows that t G L iff t' G L. 

The closure under horizontal swap is proved using the same claim. The proofs for 
vertical swap and vertical stutter uses a claim similar to Claim 15.31 but for contexts: For 
every tree automaton A there is a number I computable from A such that for every k if the 
context C\ is (k, Z)-equivalent to the context C2 (by this we mean that their roots have the 
same (k, Z)-type), then there are contexts C[, C 2 with C[ ^-equivalent to C 2 such that C[ 
induces the same function on the states of A as Cj for i = 1,2. □ 

From this lemma we know that a regular language over unranked unordered trees is 
tame iff it is fc-tame for some k iff it is (k, Z)-tame for some k, I. Moreover, as in the binary 
setting, if a regular language is tame then it is (k, £)-tame for some k and I computable 
from an automaton recognizing L. The bound on k can be obtained by a straightforward 
adaptation of Proposition 13.21 The bound on I then follows from Proposition 15.21 Hence 
we have: 

Proposition 5.4. Let L be a regular language and let A be its minimal deterministic bottom- 
up tree automaton, we have L is tame iff L is (ko,lo)-tame for ko = \A\ 3 + 1 and some Iq 
computable from A. 

5.1. Decision of ALT. We now turn to the proof of Theorem 15. 11 We begin with the proof 
for ALT as both the decision procedure and its proof are obtained as in the case of binary 
trees. Assuming tameness we obtain a bound on k and A such that a language is in ALT iff 
it is in (k, A)-ALT. Once n and A are known, it is easy do decide if a language is (k, A)-ALT 
since the number of such languages is finite. The bounds on k and A are obtained following 
the same proof structure as in the binary cases, essentially replacing /c-tame by (k, Z)-tame, 
but with several technical modifications. Therefore, we only sketch the proofs below and 
only detail the new technical material. Our goal is to prove the following result. 

Proposition 5.5. Assume L is a (k, l)-tame regular tree language and let A be its minimal 
automaton. Then L is in ALT iff L is in (k, A)- ALT where k and A are computable from 
k, I and A. 
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Notice that for each k, I the number of (k, Z)-types is finite, let be this number. 
Proposition 15.51 is now a simple consequence of the following proposition. 

Proposition 5.6. Let L be a (k,l)-tame regular tree language and let A be the minimal 
automaton recognizing L. Set A = \A\l + 1 and k = j3^^ + k + 1. Then for all k' > k, all 
A' > A and any two trees t,t' if t —r K) x) f then there exists two trees T,T' with 

(1) t e L iffT € L 

(2) t'tLiffT'eL 

(3) T~ (K , jA0 V. 

Before proving Proposition 15.61 we adapt the extra terminology we used in the proof 
of Proposition 14.21 to the unranked setting. A non-empty context C occurring in a tree t 
is a loop of (k, I) -type r if the (k, Z)-type of its root and the (k, Z)-type of its port is r. A 
non-empty context C occurring in a tree t is a (k, Z)-loop if there is some (k, Z)-type r such 
that C is a loop of (k, /)-type r. Given a context C we call the path from the root of C 
to its port the principal path of C. Finally, the result of the insertion of a (k, l)-loop C at 
a node x of a tree t is a tree T such that if t = D ■ t\ x then T = D ■ C ■ t\ x . Typically an 
insertion will occur only when the (fc, Z)-type of x is r and C is a loop of (k, Z)-type r. In 
this case the (k, Z)-types of the nodes initially from t and of the nodes of C are unchanged 
by this operation. 

Proof of Proposition ^. b\ Suppose that L is (k, Z)-tame. As we did for the proof of the 
binary case we first prove two lemmas that are crucial for the construction of T and T' . 
They show that subtrees can be replaced and contexts can be inserted as long as this does 
not change the {k + 1, ^-equivalence class of the tree. They are direct adaptations of the 
corresponding lemmas for the ranked setting: Lemmas 14.31 and 14.51 We start with subtrees. 

Lemma 5.7. Assume L is (k,l)-tame. Let t = Ds be a tree where s is a subtree oft. Let 
s' be another tree such that the roots of s and s' have the same (k, I) -type. 
If s 4(k+i,i) D and s' 4(k+i,i) D then Ds G L iff Ds' G L. 

Proof sketch. As in the binary setting the proof is done by first proving a restricted version 
where s' is actually another subtree of t. Before doing that, we state a new claim, specific 
to the unranked setting, that will be useful later in the induction bases of our proofs. In 
the binary setting, two trees that had the same /c-type at their root and were of depth 
smaller than k were equal. This obviously does not extend to unranked trees and (k, l)- 
types. However it is simple to see that equality can be replaced by indistinguishability by 
the minimal tree automaton recognizing L. 

Claim 5.8. Let A be a tree automaton and m be its counting threshold. Let t and t' be 
two trees of depth smaller than k and whose roots have the same (k, m)-type. Then t and t' 
evaluate to the same state of A. 

Proof. This is done by induction on fc. If k = 0, t and t' are leaves, it follows from their 
(0, m)-type that t = t'. 

Otherwise we know that t and t' have the same (fc, m)-type at their root therefore they 
have the same root label. Let s and s' be two trees that are children of the root of t or of 
t 1 and have the same (k — l,m)-type at their root. The depth of s and s' is smaller than 
k — 1, therefore by induction hypothesis s and s' evaluate to the same state of A. Now, 
because the roots of t and t! have the same (k, m)-type, for each (k — l,m)-type r, they 
have the same number of children of type r up to threshold m. From the previous remark 
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this implies that for each state q of A, they have the same number of children in state q up 
to threshold m. It follows from the definition of A that t and t' evaluate to the same state 
of A. □ 

We are now ready to state and prove the lemma in the restricted case. 

Claim 5.9. Assume L is (k, l)-tame. Let t be a tree and let x, y be two nodes oft not related 
by the descendant relationship and with the same (k, I) -type. We write s = t\ x , s' = t\ y and 
C the context such that t = Cs. If s =4(k+i,i) C then Cs € L iff Cs' £ L. 

Proof sketch. This proof only differs from its binary tree counterpart Claim POl in the details 
of the induction step. It is done by induction on the depth of s. 

Assume first that s is of depth less than k. Since x and y have the same (k, Z)-type 
and since I > m it follows from Claim [5\8l that s and s' evaluate to the same state on the 
automaton A recognizing L. Hence we can replace s with s' without affecting membership 
in L. 

Assume now that s is of depth greater than k. 

Let r be the (k + 1, Z)-type of x. We write si, s n for the children of s and a the label 
of its root. Since s =4(k+i,i) C, there exists a node z in C of type r. We write s" = t\ z . 

We now do a case analysis depending on the descendant relationships between x, y and 
z. As for binary trees, all cases reduce to the case when x and z are not related by the 
descendant relationship by simple (k, l)-tameness operations. Therefore we only consider 
this case here. 

Assume that x and z are not related by the descendant relationship. We show only 
that Cs € L iff Cs" £ L. The proof that Cs' £ L iff Cs" £ L is then done exactly as for 
binary trees. 

Since x and z are of same (k + 1, l)-type r, the roots of s' and s" have the same label a. 
Let s'(, . . . , s 1 ', be the children of the root of s". As in the binary case we want to replace 
the trees s%, . . . , s n with these children by induction since the depth of the trees si, . . . , s n is 
smaller than the depth of s. Unfortunately for each (k, Z)-type Tj, the number of trees whose 
root has type T{ among the children of x and among the children of z might not be the 
same. However we know that in this case both numbers are greater than I. We overcome 
this difficulty in two steps, first we modify the children of x, without affecting membership 
in L, so that if Sj and s,- have the same (k, Z)-type then Sj = Sj, then we use the fact that 
I > m in order to delete or duplicate children of x until for each (fc, Z)-type T{ the number 
of trees of root of type Tj among the children of x and among the children of z is the same. 
By definition of A, this does not affect membership in L. Finally we replace the Sj by the 
s'l by induction as in the binary case. 

For the first step notice that any of the Sj is by definition of depth smaller than s 
therefore by the induction hypothesis we can replace it with any of its siblings having the 
same (k, Z)-type at its root without affecting membership in L. □ 

We now turn to the proof of Lemma 15.71 in its general statement. The proof is done by 
induction on the depth of s'. The idea is to replace s with s' node by node. 

Assume first that s' is of depth smaller than k. Then because the (k, Z)-types of the 
roots of s and s' are the same we are in the hypothesis of Claim 15.81 and it follows that s 
and s' evaluate to the same state of A. The result follows. 

Assume now that s' is of depth greater than k. 
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Let x be the node of t corresponding to the root of s. Let r be the (k + l,Z)-type of 
the root of s'. In the binary tree case we used a sequence of tame operations to reduce 
the problem to the case where x has (k + l,Z)-type r. Using the same operations we can 
also reduce the problem to this case in the unranked setting. Then we use the induction 
hypothesis to replace the children of x by the children of the root of s'. As in the proof of 
Claim 15.91 the problem is that the number of children might not match but this is solved 
exactly as in the proof of Claim 15.91 □ 

As in the binary tree case, we now prove a result similar to Lemma 15.91 but for (k,l)- 
loops. 

Lemma 5.10. Assume L is (k,l)-tame. Let t be a tree and x a node of t of (k,l)-type r. 
Let t' be another tree such that t —(k+in t' and C be a (k,l)-loop of type r in t' . Consider 
the tree T constructed from t by inserting a copy of C at x. Then t G L iff T G L. 

Proof sketch. The proof is done using the same structure as Lemma [4.51 for the binary case. 
First we use the (k, Z)-tame property of L to show that we can insert a (k, Z)-loop C at x 
in t such that the principal path of C is the same as the principal path of C. By this we 
mean that there is a bijection from the principal path of C to the principal path of C that 
preserves the child relation and {k + 1, Z)-types. In a second step we replace one by one the 
subtrees hanging from the principal path of C with the corresponding subtrees in C. 

Let T 1 be the tree resulting from inserting C at position x. We do not detail the first 
step as it is done using exactly the same sequence of tame operations we used for this step 
in the proof of Lemma 14.51 This yields: t G L iff T' G L. We turn to the second step 
showing that T G L iff T G L. 

By construction of T' we have C =4^+1,1) t- Consider now a node x\ in the principal 
path of C and x% the corresponding node in C. As in the binary tree case we replace the 
subtrees branching out of the principal path of C with the corresponding trees branching 
out of the principal path of C using Lemma 15.71 As in the previous proof, the problem is 
that the numbers of children might not match. This is solved exactly as in the proof of 
Lemma 15.71 □ 

We now turn to the construction of T and T' and prove Proposition 15.61 

The construction is similar to the one we did in the binary tree case. We insert (k, l)- 

loops in t and t' using Lemma [5. 101 for obtaining bigger types. However inserting loops only 

affects the depth of the types. Therefore we need to do extra work in order to also increase 

the width of the types. 

Assuming t —(k,i) t' we first construct two intermediate trees T% and T[ that have the 

following properties: 

• t G L iff Ti G L 

• if eL iff T{eL 

• T x ~ (/s , )A) T[ 

This construction is the same as in the binary tree setting so we only briefly describe 
it. Let B = {ro,...,r n } be the set of (k, i)-types r such that there is a loop of (k, Z)-type 
r in t or in t' . For each r G B we fix a context C T as follows. Because r G B there is a 
context C in T\ or T[ that is a loop of (k, Z)-type r. For each r G B, we fix arbitrarily such 
a C and set C T as C ■ . . . ■ C , k 1 concatenations of the context C. Notice that the path from 

k' 

the root of C T to its port is then bigger than k'. 
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Ti is constructed from t as follows (the construction of T{ from t' is done similarly). 
The tree Ti is constructed by simultaneously inserting, for all t G B, a copy of the context 
C T at all nodes of t of type r. By Lemma 15.101 it follows that t G L iff T\ G L and t' G T 
iff T-[ G T. Using the same proof as that of Proposition 14.21 for the binary tree setting, we 
obtain Ti ^( K / jA ) T[. 

We now describe the construction of T from Ti , the construction of T' from T[ is done 
similarly. It will be convenient for us to view the nodes of Ti as the union of the nodes of t 
plus some extra nodes coming from the loops that were inserted. 

Let n be the maximum arity of a node of Ti or of T[ . We duplicate subtrees in T\ and 
T[ as follows. Let x be a node of Ti, that is not in a loop we inserted when constructing Ti 
from t. For each («/ — 1, A)-type r, if x has more than A children of type r we duplicate one of 
the corresponding subtrees until x has exactly n children of type r in total. This is possible 
without affecting membership in L because A > m|A|. Indeed, because A > m|^4|, for at 
least one state q of A, there exists more than m subtrees of x of type r for which A assigns 
that state q at their root, and by definition of A any of these subtrees can be duplicated 
without affecting membership in L. The tree T is constructed from Ti by repeating this 
operation for any node x of Ti coming from t. By construction we have T\ G L iff T G L 
and therefore t G L iff T G L. The same construction starting from T[ yields a tree T' such 
that t' G L iff T' G T. 

We now show that T T', it follows that T T' and this concludes the proof. 

Lemma 5.11. T — K i V 

Proof. We need to show that T T', T' =4 K r T and that the roots of T and T' have the 
same /t'-type. 

Recall that in Ti we distinguished between two kinds of nodes, those coming from t 
and those coming from the loops that were inserted during the construction of Ti from t. 
We make the same distinction in T by assuming that a node generated after a duplication 
gets the same kind as its original copy. 

Recall the definition of B and of C T for r G B that was used for defining T\ and T[ 
from t and i'. 

As for the binary tree case it suffices to show that for any node of T coming from t 
there is a node of T' coming from t' and having the same /t'-type. Hence the result follows 
from the claim below that is an adaptation of Claim 14.81 

Claim 5.12. Take two nodes x in t and x' in t' , such that x and x' have the same (k,X)- 
type. Let z and z' be the corresponding nodes in T and T' . Then z and z' have the same 
k' -type. 

Proof. Let x and x' be two nodes of t and t' with the same (k, A)-type. Let x\ and x[ be 
the corresponding nodes in Ti and T[. The same proof as Claim [4781 for the binary tree case 
shows that x\ and x^ have the same (k', A)-type. 

Let y be a child of x. Let y\ be the node corresponding to y in Ti. Notice now that the 
(k', A)-type of y\ in T\ is completely determined by the (k — 1, A)-type v of y in t. Indeed, 
by choice of k, during the construction of Ti, a loop of type r G B will be inserted between 
y and any descendant of y at distance at most fi(k,i) ~ 1 from y. As k > fi(k,i) + k, the 
relative positions below y where such a C T is inserted can be read from v. As the depth of 
any C T is greater than k', from v we can compute exactly the descendants of y\ in Ti up 
to depth k'. Hence v determines the («/,A)-type of y\. 
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It follows that two children of x\ or of x[ have the same (re', A)-types iff they had the 
same (re — 1, A)-types in t or in t' . 

We now construct an isomorphism between the re'-type of z and the one of z' . Let d 
be the maximal distance between z and a node that is a descendant of z where a loop was 
inserted during the construction of T from t. We construct our isomorphism by induction 
on d. 

If d = then the (k, £)-type of z is in B and as z and z' have the same (re', A)-type with 
re' > k, the (A;, Z)-type of z' is the same as the one of z'. Therefore the subtrees rooted at z 
and z' are equal up to depth re' as they all start with a copy of C T and we are done. 

Otherwise, as z and z' have the same (re', A)-type their roots must have the same labels. 
Consider now a (re' — l,A)-type fi. By construction of T and T", z and z' must have the 
same number of occurrences of children of type [i. Indeed from the type these numbers must 
match if one of them is smaller than A and by construction they are equal to n otherwise. 
Hence we have a bijection from the children of z of type /i and the children of z' of type 
/x. From the text above we know that the (re', A)-type of these nodes is determined by the 
(re — 1, A)-type of their copy in t or in t' . Because x and x' have the same (re, A)-type, the 
corresponding (re — 1, A)-types are all equal and hence all the nodes of type fj, actually have 
the same (re', A)-type. By induction they are isomorphic up to depth re' and we are done. Q 

From Claim HT. Ill the lemma follows as in the proof of Lemma 14.71 for binary trees. Q 

This concludes the proof of Proposition 15.61 □ 

5.2. Decision of ILT. In the idempotent case we can completely characterize ILT using 
closure properties. We show that membership in ILT corresponds to tameness together with 
an extra closure property denoted horizontal stutter reflecting the idempotent behavior. A 
tree language L is closed under horizontal stutter iff for any tree t and any node x of t, 
replacing t\ x with two copies of t\ x does not affect membership in L. Theorem 15.11 for ILT 
is a consequence of the following theorem. 

Theorem 5.13. A regular unordered tree language is in ILT iff it is tame and closed under 
horizontal stutter. 

Proof. It is simple to see that tameness and closure under horizontal stutter are necessary 
conditions. We prove that they are sufficient. Take a regular tree language L and suppose 
that L is tame and closed under horizontal stutter. Then there exists k and I such that L 
is (k, Z)-tame. We show that if t ~ (k+i,i) t' then t G L iff t' G L. It follows that L is in ILT. 
We first show a simple lemma stating that if two trees contain the same (k + 1, l)-types, 
then we can pump them without affecting membership in L into two trees that contain the 
same (k + 1, /)-types. 

Lemma 5.14. Let L closed under horizontal stutter and let s and s' two trees such that 
s ~ (fc+i,i) s> ■ Then there exist two trees S and S' such that: 

• s G L iff S G L. 

• s' G L iff S' el. 

• s ~ (fe+lj0 S' 
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Proof. S is constructed from s via a bottom-up procedure. Let x be a node of s. For each 
subtree rooted at a child of x, we duplicate it I times using horizontal stutter. This does 
not affect membership in L. After performing this for all nodes x of s we obtain a tree S 
with the desired properties. □ 

Let T and T' be constructed from t and if using Lemma f5.14i Let T\, . . . ,T n the children 
of the root of T and T{, . . . , T' , the children of the root of T'. Let T" be the tree whose root 
is the same as T and T' and whose children is the sequence of trees T\ , . . . , T n , T[ , . . . , T' n , . 
We show that T" G L iff T G L and T" G L iff T' G L. It will follow that T G L iff T' G L 
and by Lemma 15.141 that t € L iff if € £ which ends the proof. 

To show that T" G L iff T G L we use horizontal stutter and Lemma 15.71 As the roots 
of T and T' have the same (A; + l,/)-type, for each T[, there exists a tree such that its 
root has the same (k, Z)-type as T-. Fix such a pair Let S* be the tree obtained by 

duplicating Tj in T. By closure under horizontal stutter T G L iff S G L. But now S 1 = DTj 
for some context D such that T =4(k+i,i) D. Altogether we have that: the roots of T[ and 
Tj have the same (fc, Z)-type (by choice if % and j), T/ ^a+i j) D (as T/ ^(fc+i j) T' and 
T —(k+i,l) T') and Tj ^4^+1 ,i) D (as Tj is part of T hence of D). We can therefore apply 
Lemma E3 and DT{ G L iS DTj G L. 

Repeating this argument for all i eventually yields the tree T". This proves that T" G L 
iff T G -L. By symmetry we also have T" G L iff T' G L which concludes the proof. Q 



6. TAMENESS IS NOT SUFFICIENT 

Over strings tameness characterizes exactly LT as vertical swap and vertical stutter are 
exactly the extensions to trees of the known equations for LT (recall Section [2]) . Over trees 
this is no longer the case. In this section we provide an example of a language that is tame 
but not LT. For simplifying the presentation we assume that nodes may have between 
and three children; this can easily be turned into a binary tree language. All trees in our 
language L have the same structure consisting of a root of label a from which exactly three 
sequences of nodes with only one child (strings) are attached. The trees in L have therefore 
exactly three leaves, and those must have three distinct labels among {hi, b.2, b.3}. The 
labels of two of the branches, not including the root and the leaf, must form a sequence in 
the language b*cd*. The third branch must form a sequence in the language b*c'd*. We 
assume that b, c, c' and d are distinct labels. Note that the language does not specify 
which leaf label among {hi, b.2, b.3} is attached to the branch containing c'. 

The reader can verify that L is 1-tame. We show that L is not in LT. For all integer k, 
the two trees t and t' depicted below are such that t E L, t' L, while t ~ fc if. 
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7. Discussion 

We have shown a decidable characterization for the class of locally testable regular tree 
languages both for ranked trees and unranked unordered trees. 

Complexity. The decision procedure for deciding membership in LT as described in this 
paper requires a time which is a tower of several exponentials in the size of the deterministic 
minimal automaton recognizing L. This is most likely not optimal. In comparison, over 
strings, membership in LT can be performed in polynomial time |Pin05l . Essentially our 
procedure requires two steps. The first step shows that if a regular language L is in LT then 
it is K-locally testable for some k computable from the minimal deterministic automaton 
A recognizing L. The k obtained in Proposition 14. 1 1 is doubly exponential in the size of A. 
In comparison, over strings, this n can be shown to be polynomial. For trees we did not 
manage to get a smaller k but we have no example where even one exponential would be 
necessary. 

Our second step tests whether L is K-locally testable once k is fixed. This was easy to 
do using a brute force algorithm requiring several exponentials in k. It is likely that this 
can be optimized but we didn't investigate this direction. 

However for unranked unordered trees we have seen in Theorem 15.131 that in the case 
of ILT it is enough to test for tameness. The naive procedure for deciding tameness is 
exponential in the size of A. But the techniques presented in fBSlOj for the case of LTT, 
easily extend to the closure properties of tameness, and provide an algorithm running in time 
polynomial in the size of A. Hence membership in ILT can be tested in time polynomial in 
the size of the minimal deterministic bottom-up tree automaton recognizing the language. 

Logical characterization. There is a logical characterization of languages that are locally 
testable. It corresponds to those languages definable by formulas containing the temporal 
predicates G and X where G stands for "everywhere in the tree" while X stands for "child" . 
In the binary tree case, we also require two predicates distinguishing the left child from the 
right child. In the unranked unordered setting the logic above is closed under bisimulation 
and therefore corresponds to ILT. This shows that in a sense ILT is the natural extension 
of LT to the unranked setting. 
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Open problem. It would be interesting to 
on a finite number of conditions such as the 
more satisfying result and would most likely 
LT. 



obtain a different characterization of LT based 
ones characterizing tameness. This would be a 
provide a more efficient algorithm for deciding 



References 

[Boj07a] M. Bojariczyk. A new algorithm for testing if a regular language is locally threshold testable. 
Information Processing Letter, 104(3) :91-94, 2007. 

[Boj07b] M. Bojariczyk. Two-way unary temporal logic over trees. In IEEE Symposium on Logic in Com- 
puter Science (LICS), pages 121-130, 2007. 

[BP89] D. Beauquier and J-E. Pin. Factors of words. In Intl. Coll. on Automata, Languages and Pro- 
gramming (ICALP), pages 63-79, 1989. 

[BS73] J. A. Brzozowski and I. Simon. Characterizations of locally testable languages. Discrete Mathe- 
matics, 4:243-271, 1973. 

[BS08] M. Bojariczyk and L. Segoufin. Tree languages defined in first-order logic with one quantifier 
alternation. In Intl. Coll. on Automata, Languages and Programming (ICALP), 2008. 

[BS10] M. Benedikt and L. Segoufin. Regular languages definable in FO and FOmod. ACM Trans. Of 
Computational Logic, 11(1), 2010. 

[BSS08] M. Bojariczyk, L. Segoufin, and H. Straubing. Piecewise testable tree languages. In IEEE Sym- 
posium on Logic in Computer Science (LICS), 2008. 

[BW06] M. Bojariczyk and I. Walukiewicz. Characterizing EF and EX tree logics. Theoretical Computer 
Science, 358(255-272), 2006. 

[BW07] M. Bojariczyk and I. Walukiewicz. Forest algebras. In Automata and Logic: History and Perspec- 
tives, pages 107 - 132. Amsterdam University Press, 2007. 

[CGJ + 07] Hubert Comon, Max Dauchet Remy Gilleron, Florent Jacquemard, Christof Loding, Denis Lugiez, 
Sophie Tison, and Marc Tommasi. Tree automata techniques and applications. Available electron- 
ically at |http: //ta t a. gf orge '. inria. f r/| 2007. 

[EW05] Z. Esik and P. Weil. Algebraic characterization of regular tree languages. Theoretical Computer 
Science, 340:291-321, 2005. 

[McN74] R. McNaughton. Algebraic decision procedures for local testability. Mathematical System Theory, 
8(l):60-76, 1974. 

[Pin05] J-E. Pin. The expressive power of existential first order sentences of Buchi's sequential /i calculus. 

Discrete Mathematics, 291:155-174, 2005. 
[Pla08] T. Place. Characterization of logics over ranked tree languages. In Conference on Computer 

Science Logic (CSL), pages 401-415, 2008. 
[Str85] H. Straubing. Finite semigroup varieties of the form V*D. J. of Pure and Applied Algebra, 36:53- 

94, 1985. 

[Til87] B. Tilson. Categories as algebra: an essential ingredient in the theory of monoids. J. of Pure and 

Applied Algebra, 48:83-198, 1987. 
[TW85] D. Therien and A. Weiss. Graph congruences and wreath products. J. Pure and Applied Algebra, 

36:205-215, 1985. 

[Wil96] T. Wilke. An algebraic characterization of frontier testable tree languages. Theoretical Computer 
Science, 154(1):85-106, 1996. 



This work is licensed under the Creative Commons Attribution-NoDerivs License. To view 
a copy Of this license, visit http://creativecommons.Org/licenses/by-nd/2.0/ or send a 
letter to Creative Commons, 171 Second St, Suite 300, San Francisco, CA 94105, USA, or 
Eisenacher Strasse 2, 10777 Berlin, Germany 



